<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>



<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<META name="GENERATOR" content="hevea 1.08">
<LINK rel="stylesheet" type="text/css" href="tutorial.css">
<TITLE>
Input/Output
</TITLE>
</HEAD>
<BODY >
<A HREF="tutorial006.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
<A HREF="tutorial008.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<HR>

<H1 CLASS="chapter"><A NAME="htoc60">Chapter&nbsp;6</A>&nbsp;&nbsp;Input/Output</H1><UL>
<LI><A HREF="tutorial007.html#toc36">Reading input data into data structures</A>
<LI><A HREF="tutorial007.html#toc37">How to use DCGs</A>
<LI><A HREF="tutorial007.html#toc38">Creating output data files</A>
<LI><A HREF="tutorial007.html#toc39">Good and bad data formats</A>
<LI><A HREF="tutorial007.html#toc40">Report generation</A>
</UL>

<A NAME="inputoutput"></A>
In this chapter we will discuss input and output with the ECLiPSe system. We will first discuss how to read data into an ECLiPSe program, then discuss different output methods. From this we extract some rules about good and bad data formats that may be useful when defining a data exchange format between different applications. At the end we show how to use a simple report generator library in RiskWise, which converts data lists into HTML reports.<BR>
<BR>
<A NAME="toc36"></A>
<H2 CLASS="section"><A NAME="htoc61">6.1</A>&nbsp;&nbsp;Reading input data into data structures</H2>
<A NAME="ReadingInput"></A>
The easiest way to read data <A NAME="@default185"></A>into ECLiPSe programs is to use the Prolog term format for the data. Each term is terminated by a fullstop<A NAME="@default186"></A>, which is a dot (.) followed by some white space. The following code reads terms from a file until the end of file is encountered and returns the terms in a list.
<PRE CLASS="verbatim">
:-mode read_data(++,-).
read_data(File,Result):-
        open(File,read,S),
        read(S,X),
        read_data_lp(S,X,Result),
        close(S).

read_data_lp(_S,end_of_file,[]):-
        !.
read_data_lp(S,X,[X|R]):-
        read(S,Y),
        read_data_lp(S,Y,R).
</PRE>This method is very easy to use if both source and sink of the data are ECLiPSe programs. Unfortunately, data provided by other applications will normally not be in the Prolog term format. For them we will have to use some other techniques<SUP><A NAME="text8" HREF="#note8">1</A></SUP>.<BR>
<BR>
<A NAME="toc37"></A>
<H2 CLASS="section"><A NAME="htoc62">6.2</A>&nbsp;&nbsp;How to use DCGs</H2>
<A NAME="howtousedcgs"></A>
In this section we describe the use of tokenizers and DCG <A NAME="@default187"></A><A NAME="@default188"></A>(definite clause grammar) to produce a very flexible input system. The input routine of the NDI mapper<A NAME="@default189"></A> of RiskWise is completely implemented with this method, and we use some of the code in the examples below.<BR>
<BR>
In this approach the data input is split into two parts, a tokenizer and a parser. The tokenizer read the input and splits it into tokens. Each token corresponds to one field in a data item. The parser is used to recognize the structure of the data and to group all data belonging to one item together.<BR>
<BR>
Using these techniques to read data files is a bit of overkill, they are much more powerful and are used for example to read ECLiPSe terms themselves. But, given the right grammar, this process is very fast and extremely easy to modify and extend.<BR>
<BR>
The top level routine <I>read_file</I> <A NAME="@default190"></A>opens the file, calls the tokenizer, closes the file and starts the parser. We assume here that at the end of the parsing the complete input stream has been read (third argument in predicate <I>phrase</I><A NAME="@default191"></A>). Normally, we would check the unread part and produce an error message.
<PRE CLASS="verbatim">
:-mode read_file(++,-).
read_file(File,Term):-
        open(File,'read',S),
        tokenizer(S,1,L),
        close(S),
        phrase(file(Term),L,[]).
</PRE>
The tokenizer<A NAME="@default192"></A> is a bit complicated, since our NDI data format explicitly mentions end-of-line<A NAME="@default193"></A> markers, and does not distinguish between lower and upper case spelling. Otherwise, we might be able to use the built-in tokenizer of ECLiPSe (predicate <I>read_token</I><A NAME="@default194"></A>).<BR>
<BR>
The tokenizer reads one line of the input at a time and returns it as a string. After each line, we insert a <I>end_of_line(N)</I> token into the output with <I>N</I> the current line number. This can be used for meaningful error messages, if the parsing fails (not shown here). We then split the input line into white space separated strings, eliminate any empty strings and return the rest as our tokens.<BR>
<BR>
The output of the tokenizer will be a list of strings intermixed with end-of-line markers.
<A NAME="@default195"></A>
<PRE CLASS="verbatim">
:-mode tokenizer(++,++,-).
tokenizer(S,N,L):-
        read_string(S,'end_of_line',_,Line),
        !,
        open(string(Line),read,StringStream),
        tokenizer_string(S,N,StringStream,L).
tokenizer(_S,_N,[]).

tokenizer_string(S,N,StringStream,[H|T]):-
        non_empty_string(StringStream,H),
        !,
        tokenizer_string(S,N,StringStream,T).
tokenizer_string(S,N,StringStream,[end_of_line(N)|L]):-
        close(StringStream),
        N1 is N+1,
        tokenizer(S,N1,L).

non_empty_string(Stream,Token):-
        read_string(Stream, " \t\r\n", _, Token1),
        (Token1 = "" -&gt;
            non_empty_string(Stream,Token)
        ;
            Token = Token1
        ).
</PRE>We now show an example of grammar rules which define one data file of the NDI mapper, the RouterTrafficSample<A NAME="@default196"></A> file. The grammar for the file format is shown below:
<PRE CLASS="verbatim">
file              := &lt;file_type_line&gt; 
                     &lt;header_break&gt;
                     [&lt;data_line&gt;]* 
file_type_line    := RouterTrafficSample &lt;newline&gt; 
header_break      := # &lt;newline&gt; 
data_line         := &lt;timestamp&gt; 
                     &lt;router_name&gt; 
                     &lt;tcp_segments_in&gt; 
                     &lt;tcp_segments_out&gt; 
                     &lt;udp_datagrams_in&gt; 
                     &lt;udp_datagrams_in&gt; 
                     &lt;newline&gt; 
timestamp         := &lt;timestamp_ms&gt; 
router_name       := &lt;name_string&gt;  
tcp_segments_in   := integer 
tcp_segments_out  := integer 
udp_datagrams_in  := integer 
udp_datagrams_out := integer 
</PRE>
This grammar translates directly into a DCG representation. The start symbol<A NAME="@default197"></A> of the grammar is <I>file(X)</I>, the argument <I>X</I> will be bound to the parse tree<A NAME="@default198"></A> for the grammar. Each rule uses the symbol <CODE>--&gt;</CODE> to define a rule head on the left side and its body on the right. All rules for this particular data file replace one non-terminal symbol<A NAME="@default199"></A><A NAME="@default200"></A> with one or more non-terminal symbols. The argument in the rules is used to put the parse tree together. For this data file, the parse tree will be a term <I>router_traffic_sample(L)</I> with <I>L</I> a list of terms <I>router_traffic_sample(A,B,C,D,E,F)</I> where the arguments are simple types (atoms and integers).
<PRE CLASS="verbatim">
file(X) --&gt; router_traffic_sample(X).

router_traffic_sample(router_traffic_sample(L)) --&gt; 
        file_type_line("RouterTrafficSample"),
        header_break,
        router_traffic_sample_data_lines(L).

router_traffic_sample_data_lines([H|T]) --&gt; 
        router_traffic_sample_data_line(H), 
        !,
        router_traffic_sample_data_lines(T).
router_traffic_sample_data_lines([]) --&gt; [].

router_traffic_sample_data_line(
            router_traffic_sample(A,B,C,D,E,F)) --&gt; 
        time_stamp(A),
        router_name(B),
        tcp_segments_in(C),
        tcp_segments_out(D),
        udp_datagrams_in(E),
        udp_datagrams_out(F),
        new_line.

tcp_segments_in(X) --&gt; integer(X).

tcp_segments_out(X) --&gt; integer(X).

udp_datagrams_in(X) --&gt; integer(X).

udp_datagrams_out(X) --&gt; integer(X).
</PRE>Note the cut in the definition of <I>router_traffic_sample_data_lines</I>, which is used to commit to the rule when a complete data line as been read. If a format error occurs in the file, then we will stop reading at this point, and the remaining part of the input will be returned in <I>phrase</I><A NAME="@default201"></A>.<BR>
<BR>
The following rules define the basic symbols of the grammar. Terminal symbols<A NAME="@default202"></A><A NAME="@default203"></A> are placed in square brackets, while additional Prolog code is added with braces<A NAME="@default204"></A>. The <I>time_stamp</I> rule for example reads one token <I>X</I>. It first checks that <I>X</I> is a string, then converts it to a number <I>N</I>, and then uses a library predicate <I>eclipse_date</I><A NAME="@default205"></A> to convert <I>N</I> into a date representation <I>Date</I>: 2006/09/23 01:48:40 , which is returned as the parse result. 
<PRE CLASS="verbatim">
file_type_line(X) --&gt; ndi_string(X), new_line.

header_break --&gt; 
        ["#"],
        new_line.

router_name(X) --&gt; name_string(X).

time_stamp(Date) --&gt; 
        [X],
        {string(X),
         number_string(N,X),
         eclipse_date(N,Date)
        }.

integer(N) --&gt; [X],{string(X),number_string(N,X),integer(N)}.

name_string(A) --&gt; ndi_string(X),{atom_string(A,X)}.

ndi_string(X) --&gt; [X],{string(X)}.

new_line --&gt; [end_of_line(_)].
</PRE>These primitives are reused for all files, so that adding the code to read a new file format basically just requires some rules defining the format. <BR>
<BR>
<A NAME="toc38"></A>
<H2 CLASS="section"><A NAME="htoc63">6.3</A>&nbsp;&nbsp;Creating output data files</H2>
In this section we discuss how to generate output data files<A NAME="@default206"></A>. We present three methods which implement different output formats.<BR>
<BR>

<H3 CLASS="subsection"><A NAME="htoc64">6.3.1</A>&nbsp;&nbsp;Creating Prolog data</H3>
We first look at a special case where we want to create a file which can be read back with the input routine shown in section <A HREF="#ReadingInput">6.1</A>. The predicate <I>output_data</I> writes each item in a list of terms on one line of the output file, each line terminated by a dot (.). The predicate <I>writeq</I><A NAME="@default207"></A> ensures that atoms are quoted, operator definitions<A NAME="@default208"></A> are handled correctly, etc.
<PRE CLASS="verbatim">
:-mode output_data(++,+).
output_data(File,L):-
        open(File,'write',S),
        (foreach(X,L),
         param(S) do
           writeq(S,X),writeln('.')
        ),
        close(S).
</PRE>It is not possible to write unbound constrained variables to a file and to load them later, they will not be re-created with their previous state and constraints. In general, it is a good idea to restrict the data format to ground terms<A NAME="@default209"></A>, i.e. terms that do not contain any variables.<BR>
<BR>

<H3 CLASS="subsection"><A NAME="htoc65">6.3.2</A>&nbsp;&nbsp;Simple tabular format</H3>
Generating data in Prolog format is easy if the receiver of the data is another ECLiPSe program, but may be inconvienient if the receiver is written in another language. In that case a tabular format<A NAME="@default210"></A> that can be read with routines like <I>scanf</I><A NAME="@default211"></A> is easier to handle. The following program segment shows how this is done. For each item in a list we extract the relevant arguments, and write them to the output file separated by white space.
<PRE CLASS="verbatim">
:-mode output_data(++,+).
output_data(File,L):-
        open(File,'write',S),
        (foreach(X,L),
         param(S) do
            output_item(S,X)
        ),
        close(S).

output_item(S,data_item with [attr1:A1,attr2:A2]):-
        write(S,A1),
        write(S,' '),
        write(S,A2),
        nl(S).
</PRE>We use the predicate <I>write</I><A NAME="@default212"></A> to print out the individual fields. As this predicate handles arbitrary argument types, this is quite simple, but it does not give us much control over the format of the fields.<BR>
<BR>

<H3 CLASS="subsection"><A NAME="htoc66">6.3.3</A>&nbsp;&nbsp;Using <I>printf</I></H3>
Instead, we can use the predicate <I>printf</I><A NAME="@default213"></A> which behaves much like the C-library routine. For each argument we must specify the argument type and an optional format<A NAME="@default214"></A>. If we make a mistake in the format specification, a run-time error will result.
<PRE CLASS="verbatim">
:-mode output_data(++,+).
output_data(File,L):-
        open(File,'write',S),
        (foreach(X,L),
         param(S) do
            output_item(S,X)
        ),
        close(S).

output_item(S,data_item with [attr1:A1,attr2:A2]):-
        printf(S,"%s %d\n",[A1,A2]).
</PRE>We can use the same schema for creating tab<A NAME="@default215"></A> or comma separated files<A NAME="@default216"></A>, which provides a simple interface to spreadsheets<A NAME="@default217"></A> and data base readers<A NAME="@default218"></A>.<BR>
<BR>
<A NAME="toc39"></A>
<H2 CLASS="section"><A NAME="htoc67">6.4</A>&nbsp;&nbsp;Good and bad data formats</H2>
When defining the data format for an input or output file, we should choose a representation which suits the ECLiPSe application. Table <A HREF="#InputFormats">6.1</A> shows good and bad formats<A NAME="@default219"></A>. Prolog terms are very easy to read and to write, simple tabular forms are easy to write, but more complex to read. Comma separated files need a special tokenizer which separates fields by comma characters. The most complex input format is given by a fixed column format<A NAME="@default220"></A><A NAME="@default221"></A>, for example generated by FORTRAN<A NAME="@default222"></A> applications. We should avoid such data formats as input if possible, since they require significant development effort.
<BLOCKQUOTE CLASS="table"><DIV CLASS="center"><HR WIDTH="80%" SIZE=2></DIV>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD ALIGN=left NOWRAP>Format</TD>
<TD ALIGN=right NOWRAP>Input</TD>
<TD ALIGN=right NOWRAP>Output</TD>
</TR>
<TR><TD ALIGN=left NOWRAP>Prolog terms</TD>
<TD ALIGN=right NOWRAP>++</TD>
<TD ALIGN=right NOWRAP>++</TD>
</TR>
<TR><TD ALIGN=left NOWRAP>EXDR</TD>
<TD ALIGN=right NOWRAP>++</TD>
<TD ALIGN=right NOWRAP>++</TD>
</TR>
<TR><TD ALIGN=left NOWRAP>White space separated</TD>
<TD ALIGN=right NOWRAP>+</TD>
<TD ALIGN=right NOWRAP>++</TD>
</TR>
<TR><TD ALIGN=left NOWRAP>Comma separated</TD>
<TD ALIGN=right NOWRAP>-</TD>
<TD ALIGN=right NOWRAP>++</TD>
</TR>
<TR><TD ALIGN=left NOWRAP>Fixed columns</TD>
<TD ALIGN=right NOWRAP>- -</TD>
<TD ALIGN=right NOWRAP>+</TD>
</TR></TABLE>
<BR>
<BR>
<DIV CLASS="center">Table 6.1: <A NAME="InputFormats"></A>Good and bad input formats</DIV><BR>
<BR>

<DIV CLASS="center"><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
<A NAME="toc40"></A>
<H2 CLASS="section"><A NAME="htoc68">6.5</A>&nbsp;&nbsp;Report generation</H2>
There is another output format that can be generated quite easily. RiskWise uses a <I>report</I><A NAME="@default223"></A> library, which presents lists of items as HTML<A NAME="@default224"></A> tables in hyper linked files. This format is very useful to print some data in a human readable form, as it allows some navigation<A NAME="@default225"></A> across files and sorting of tables by different columns. Figure <A HREF="#HTMLReport">6.1</A> shows an example from the RiskWise application.
<BLOCKQUOTE CLASS="figure"><DIV CLASS="center"><HR WIDTH="80%" SIZE=2></DIV>
figure=htmlreport.eps,width=13.7cm
<BR>
<BR>
<DIV CLASS="center">Figure 6.1: <A NAME="HTMLReport"></A>HTML Report</DIV><BR>
<BR>

<DIV CLASS="center"><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
<HR WIDTH="50%" SIZE=1><DL CLASS="list"><DT CLASS="dt-list"><A NAME="note8" HREF="#text8"><FONT SIZE=5>1</FONT></A><DD CLASS="dd-list">We should at this point again mention the possibilities of the EXDR format which can be easily read into ECLiPSe, and which is usually simpler to generate in other languages than the canonical Prolog format.
</DL>
<HR>
<A HREF="tutorial006.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
<A HREF="tutorial008.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
</BODY>
</HTML>
